Deriving a bi-lingual dictionary from raw transcription data
نویسنده
چکیده
We present a bigram-based method for deriving bi-lingual dictionary entries from two corpora of spontaneous speech (as represented in transcriptions). In contrast to e.g. [1], our method does not require translated or otherwise aligned texts; the corpora representing the source and target languages may be unrelated wrt. size, vocabulary richness, frequency distribution, and activity type. Examples are given using Danish and Swedish transcription data (and hints of English). We conclude with a discussion of the use of corpus-driven methods in language preservation and literation projects.
منابع مشابه
Synset Assignment for Bi-lingual Dictionary with Limited Resource
This paper explores an automatic WordNet synset assignment to the bi-lingual dictionaries of languages having limited lexicon information. Generally, a term in a bilingual dictionary is provided with very limited information such as part-of-speech, a set of synonyms, and a set of English equivalents. This type of dictionary is comparatively reliable and can be found in an electronic form from v...
متن کاملCreating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces
This paper proposes a novel method for automatically acquiring multilingual lexica from non-parallel data and reports some initial experiments to prove the viability of the approach. Using established techniques for building mono-lingual vector spaces two independent semantic vector spaces are built from textual data. These vector spaces are related to each other using a small reference word li...
متن کاملMan-Aided Computer, Translation from English into French using an on-Line System to Manipulate a Bi-Lingual Conceptual Dictionary, or Thesaurus
متن کامل
Comparing Multiple Methods for Japanese and Japanese-English Text Retrieval
The NACSIS collection of Japanese scienti c documents (with English titles) provides a solid foundation for information retrieval research into 1) segmentation methods for Japanese text, 2) e ective methods for monolingual Japanese retrieval, and 3) JapaneseEnglish cross-language retrieval. This paper compares multiple methods for Japanese and Japanese-English text retrieval. Our focus is on ac...
متن کاملDictionary acquisition using parallel text and co-occurrence statistics
We present a simple and efficient approach for deriving bilingual dictionaries from sentence-aligned parallel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manually; the analysis accounts for frequency and corpus size effects.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005